[6.2] Fix hang in ServiceConsoleTests.serviceShutdown #681

jakepetroules · 2025-07-28T19:30:37Z

This hang occurred only in CI environments and only on Linux. Here's the sequence of events:

Test terminates swbuild using SIGKILL
OS reparents SWBBuildService (a subprocess of swbuild) to launchd (Darwin) / init (others)
OS closes the file descriptors for the I/O pipes swbuild has connected to SWBBuildService
SWBBuildService's read() loop indicates EOF due to the broken pipe
SWBBuildService causes itself to exit

At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely.

To fix this, there are two aspects:

A timeout is added around the termination monitoring loop that forces the exit promise to be fulfilled with an error if a 30-second interval elapses without the process exiting
We switch from using a getpgid loop to using a waitid loop, where the terminal state is that the process has exited... we don't care if the zombie hasn't been collected by init, only that it's not in a running state

This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know which test is the problem.

This hang occurred only in CI environments and only on Linux. Here's the sequence of events: - Test terminates swbuild using SIGKILL - OS reparents SWBBuildService (a subprocess of swbuild) to launchd (Darwin) / init (others) - OS closes the file descriptors for the I/O pipes swbuild has connected to SWBBuildService - SWBBuildService's read() loop indicates EOF due to the broken pipe - SWBBuildService causes itself to exit At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely. To fix this, there are two aspects: - A timeout is added around the termination monitoring loop that forces the exit promise to be fulfilled with an error if a 30-second interval elapses without the process exiting - We switch from using a getpgid loop to using a waitid loop, where the terminal state is that the process has _exited_... we don't care if the zombie hasn't been collected by init, only that it's not in a running state This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know _which_ test is the problem.

jakepetroules · 2025-07-28T19:30:45Z

@swift-ci test

jakepetroules requested review from aciidgh, mhrawdon, mirza-garibovic, neonichu and owenv as code owners July 28, 2025 19:30

jakepetroules enabled auto-merge (rebase) July 28, 2025 19:31

neonichu approved these changes Jul 28, 2025

View reviewed changes

mhrawdon approved these changes Jul 28, 2025

View reviewed changes

jakepetroules merged commit 9b8690f into swiftlang:release/6.2 Jul 28, 2025
20 of 24 checks passed

jakepetroules deleted the eng/PR-hang-in-shutdown-release/6.2 branch July 28, 2025 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[6.2] Fix hang in ServiceConsoleTests.serviceShutdown #681

[6.2] Fix hang in ServiceConsoleTests.serviceShutdown #681

Uh oh!

jakepetroules commented Jul 28, 2025

Uh oh!

jakepetroules commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[6.2] Fix hang in ServiceConsoleTests.serviceShutdown #681

[6.2] Fix hang in ServiceConsoleTests.serviceShutdown #681

Uh oh!

Conversation

jakepetroules commented Jul 28, 2025

Uh oh!

jakepetroules commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants